04. Collecting your Data

Data Guide

A simple training dataset has been provided in this project's repository. This dataset will allow you to verify that you're segmentation network is semi-functional. However, if you're interested in improving your score, you may want to collect additional training data. To do it, please see the following steps.

The data directory is organized as follows:

data/runs - contains the results of prediction runs
data/train/images - contains images for the training set
data/train/masks - contains masked (labeled) images for the training set
data/validation/images - contains images for the validation set
data/validation/masks - contains masked (labeled) images for the validation set
data/weights - contains trained TensorFlow models
data/raw_sim_data/train/run1
data/raw_sim_data/validation/run1

Training Set

  1. Run QuadSim
  2. Click the DL Training button
  3. Set patrol points, path points, and spawn points as taught below in the next lesson Data Collection Guide
  4. With the simulator running, press "r" to begin recording.
  5. In the file selection menu navigate to the data/raw_sim_data/train/run1 directory
  6. Optional: to speed up data collection, press "9" (1-9 will slow down collection speed)
  7. When you have finished collecting data, hit "r" to stop recording.
  8. To reset the simulator, hit "<esc>"
  9. To collect multiple runs create directories data/raw_sim_data/train/run2, data/raw_sim_data/train/run3 and repeat the above steps.

Validation Set

To collect the validation set, repeat both sets of steps above, except using the directory data/raw_sim_data/validation instead rather than data/raw_sim_data/train.

Image Preprocessing

Before the network is trained, the images first need to undergo a preprocessing step. The preprocessing step transforms the depth masks from the sim, into binary masks suitable for training a neural network. It also converts the images from .png to .jpeg to create a reduced sized dataset, suitable for uploading to Udacity Workspace or the AWS Instance.
To run preprocessing:

$ python preprocess_ims.py

Note: If your data is stored as suggested in the steps above, this script should run without error.

Important Note 1:

Running preprocess_ims.py does not delete files in the processed_data folder. This means if you leave images in processed data and collect a new dataset, some of the data in processed_data will be overwritten some will be left as is. It is recommended to delete the train and validation folders inside processed_data(or the entire folder) before running preprocess_ims.py with a new set of collected data.

Important Note 2:

The notebook, and supporting code assume your data for training/validation is in data/train, and data/validation. After you run preprocess_ims.py you will have new train, and possibly validation folders in the processed_ims.
Rename or move data/train, and data/validation, then move data/processed_ims/train, into data/, and data/processed_ims/validationalso into data/

Important Note 3:

Merging multiple train or validation may be difficult, it is recommended that data choices be determined by what you include in raw_sim_data/train/run1 with possibly many different runs in the directory. You can create a tempory folder in data/ and store raw run data you don't currently want to use, but that may be useful for later. Choose which run_x folders to include in raw_sim_data/train, and raw_sim_data/validation, then run preprocess_ims.py from within the 'code/' directory to generate your new training and validation sets.